Rho-domain Metrics

ABSTRACT

Video encoders, systems and methods are described that characterize video encoding processes using a ρ-domain deviation metric. The deviation metric represents a weighted difference between actual non-zero coefficients and the expected non-zero coefficients, the actual and expected coefficients corresponding to quantization of a macroblock in a video frame during video encoding of the frame. The deviation metric is used to adjust the video encoding process to obtain an optimized encoding bit rate for a desired video encoding quality by selecting a quantizing parameter based on a normalized value of the deviation metric. The quantizing parameter can be selected from a table indexed using the deviation metric.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims priority from PCT/CN2010/076564 (title: “Rho-Domain Metrics”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, from PCT/CN2010/076569 (title: “Video Classification Systems and Methods”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, from PCT/CN2010/076555 (title: “Video Analytics for Security Systems and Methods”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, and from PCT/CN2010/076567 (title: “Systems And Methods for Video Content Analysis) which was filed in the Chinese Receiving Office on Sep. 2, 2010, each of these applications being hereby incorporated herein by reference. The present application is also related to concurrently filed U.S. patent non-provisional applications entitled “Video Classification Systems and Methods” (attorney docket no. 043497-0393274), “Video Analytics for Security Systems and Methods” (attorney docket no. 043497-0393277) and “Systems And Methods for Video Content Analysis” (attorney docket no. 043497-0393278), which are expressly incorporated by reference herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of a frame illustrating presence of non-zero coefficients (NZ) in macroblocks.

FIG. 2 is a chart showing an example of an exponential relationship of NZ(ρ) and quantization parameters.

FIG. 3 is a chart showing a ρ-domain deviation metric θ as a recursive weighted difference between the theoretical and actual values.

FIG. 4 is a simplified block diagram showing a video generation system.

FIG. 5 is a chart illustrating the linear relationship between video quality quantization parameter.

FIG. 6 is a flowchart illustrating a process for mode decision algorithm video encoding according to certain aspects of the invention.

FIG. 7 is a simplified block schematic illustrating a processing system employed in certain embodiments of the invention.

DETAILED DESCRIPTION

Embodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the disclosed embodiments will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosed embodiments. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, certain embodiments of the present invention encompass present and future known equivalents to the components referred to herein by way of illustration.

Certain embodiments of the invention provide an innovative ρ-domain metric “θ” and systems and methods that apply the metric. In some embodiments, the definition of ρ in ρ-domain can be taken to be the number non-zero coefficients after transform and quantization in a video encoding process. Additionally, the term “NZ” will be used herein to represent ρ, where NZ can be understood as meaning a number of non-zero coefficients after quantization of each 16×16 pixel macroblock (“MB”) in video standards such as the H.264 video standard.

An example illustrating NZ calculation is shown in FIG. 1. It has been shown through theory and experiment that ρ has a linear relationship with the video textual encoding bit rate. Generally, proposed ρ-domain source models and models consider the bit rate R as a function of ρ which is the percentage of zeros among the quantized coefficients. Observations show that ρ monotonically increases with quantization step-size QP, which implies there is a one-to-one mapping between them. Accordingly, certain embodiments provide a frame-level rate control algorithm based on these properties. Some embodiments can employ any of a plurality of suitable algorithms that may improve the accurate estimation of the R-ρ function. The relationship of NZ(ρ) and QP can be modeled by exponential equations, as illustrated in FIG. 2. The dotted curve 22 represents actual frame-level NZ vs. QP points from encoding, while the solid curve 23 represents exponential function modeling. A table of NZ vs. QP can be obtained from the exponential model.

In certain embodiments, a ρ-domain deviation metric θ may be defined as a recursive weighted ratio of the theoretical NZ_QP curve and the actual NZ_QP curve, as illustrated in FIG. 3. One curve 33 represents the theoretical NZ_QP curve and a second curve 32 represent an actual NZ vs. QP curve measured during encoding. One difference between curves 32 and 33 may be denoted as deviation metric θ. Deviation metric θ can be employed in video encoding to determine the motion complexity of a video sequence, to determine the encoded video quality, to determine short scene cut, and to determine the actual QP used to best meet the predefined bit rate budget.

Interoperation and interaction of elements in one example of a video generation system utilizing ρ-domain deviation metric θ and its applications is depicted in FIG. 4. The combination of hardware and software employed is typically determined according to requirements of the application and the configuration shown in FIG. 4 is provided for the sole purpose of simplifying description. A video encoder 400 typically generates a number of non-zero coefficient (NZ) per MB and/or per frame as a byproduct of its video encoding process. The NZ information is processed and ρ-domain deviation metric θ is calculated as meta-data 402 to feed into various algorithms of interest.

In certain embodiments, various features can be associated with ρ-domain deviation metric θ and certain advantages can be derived from these features. Moreover, hardware, software and individual algorithms can be optimized to maximize the derived advantages. In certain embodiments, θ can be used to categorize video motion complexity. A mode decision system 404 may employ ρ-domain metric θ to obtain an optimized decision process. Deviation θ can be defined as the weighted difference of a theoretical NZ_QP curve from an actual curve obtained from encoding process. Normalized θ fluctuates around a value of 1.0. A value of θ that is smaller than 1 indicates that the actual encoded bit rate is larger than expected, implying that a more complicated motion contextual content has been encountered. A value of θ that is larger than 1.0 indicates that fewer non-zero coefficients are encoded, implying that smoother motion content has been encountered.

In some embodiments, θ can be used to calculate encoded video quality curve Lq (see FIG. 5). Encoded video quality Q has a linear relationship with the quantization parameter QP used in an encoding process such as that shown in FIG. 4. A linear model Q_QP can be obtained from experimental data. Q_QP linear model can be adjusted based on the deviation θ: i.e., the quality and QP relationship is a function of motion complexity of the video content to be encoded. The adjusted Q_QP model can serve as the target quality curve of the video content. If a target quality is set, then actual QP becomes a function of deviation θ, and a table of QP and θ can be derived. A target quality video encoding algorithm can be achieved using a simple table lookup operation.

In certain embodiments, θ can be used to determine changes in video scene. It can be shown experimentally that the number of non-zero coefficients (NZ) increases multiple times in a scene change P-frame due to a lack of temporary correlation between the scene change frame and its reference frame. Therefore, certain embodiments utilize deviation θ to determine scene change with a good degree of robustness and very low computational complexity.

Certain embodiments combine θ and NZ_QP curve to obtain a more accurate bit rate. NZ_QP curve can be adjusted to reflect a more accurate encoding bit rate for a given video sequence. Therefore, a more accurate rate control encoding can be achieved using deviation metric θ.

Example Constant Bit Rate Control

Certain embodiments employ efficient and accurate constant bit rate control methods and algorithms 406 based on deviation θ features described above in relation to video scene changes and by combining θ and the NZ_QP curve. For the purposes of description, a group of pictures (“GOP”) may be defined as a group of pictures starting from an intra-coded frame (“I-frame”), and its following inter-predicted frames (“P/B-frames”). A target bit budget may be assigned to each I or P/B frame in accordance with a target bit rate per GOP. An adjusted NZ_QP table based on the recursively weighted deviation θ can reflect a more accurate content based NZ_QP relationship. A predicted NZ value may be adaptively estimated for a current frame to be encoded, and a quantization parameter QP can be calculated from the NZ_QP curve to control the bit rate for the current frame. If deviation θ changes abruptly above a threshold level, a scene change detection may be indicated and the rate control algorithm can be reset. A cost efficient and robust constant bit rate algorithm can be designed and implemented through the use of deviation θ.

Example Quality Bound Variable Bit Rate Control

In certain embodiments, where constant bit rate algorithm is used with video motion where there is varying complexity, each frame may be assigned and encoded with the same bit rate, resulting in temporary differences in video quality. Human visual system theory suggests that human vision is sensitive to change in motion (temporal direction) and textural complexity (spatial video content). Accordingly, a quality bound variable bit rate algorithm 408 (FIG. 4) can be provided by allocating more bits to video frames that are subject to temporal and spatial changes, and by allocating fewer bits to smooth motion and textually simple video frames, while still maintaining a target minimum video quality (quality bound) by utilizing the metric θ. As described above, algorithms and methods for categorizing video motion complexity can be used to categorize motion/textual changing frames. As further described above Q_QP tables and tables of QP and θ can be used to bind the smooth and textural simple frames with a predefined minimum quality. Deviation θ features described above in relation to video scene changes and combining θ and NZ_QP curve can be used to control encoded bit to a target bit rate.

Example Network Adaptive Variable Frame Rate Control

Network fluctuation can severely affect a user's quality of perception (“QOP”) when playing back network transmitted video streams. To accommodate network fluctuation, a network adaptive variable frame rate algorithm 410 (FIG. 4) can be designed using rho-domain metric θ. With reference to FIG. 6, certain of the systems and methods described herein may be employed to obtain a suitable variable frame rate algorithm. At step 600, the network provides feedback information comprising user defined minimum video quality, video channel priority and network bandwidth availability. At step 602, a quantization parameter QP is calculated based on the deviation θ and its corresponding rate control implementation. At step 604, and based on deviation θ, video motion complexity can be categorized, and a new quantization parameter QP_1 with respect to the minimum quality requirement may be calculated. At step 606, the quantization parameter difference (“Diff_QP”) between QP and QP_1 is calculated. A new frame rate to be encoded can be obtained based on Diff_QP and the content of a precalculated Diff_QP v. frame rate table. In certain embodiments, a high priority channel's frame rate is maintained unchanged as far as possible. If a larger Diff_QP is encountered, a downsizing of encoding picture resolutions can be recommended and/or performed. Downsizing of picture resolutions can include, for example, downsizing from full-size D1 resolution to common intermediate format CIF resolution.

System Description

Turning now to FIG. 7, certain embodiments of the invention employ a processing system that includes at least one computing system 70 deployed to perform certain of the steps described above. Computing system 70 may be a commercially available system that executes commercially available operating systems such as Microsoft Windows®, UNIX or a variant thereof, Linux, a real time operating system and or a proprietary operating system. The architecture of the computing system may be adapted, configured and/or designed for integration in the processing system, for embedding in one or more of an image capture system, communications device and/or graphics processing systems. In one example, computing system 70 comprises a bus 702 and/or other mechanisms for communicating between processors, whether those processors are integral to the computing system 70 (e.g. 704, 705) or located in different, perhaps physically separated computing systems 700. Typically, processor 704 and/or 705 comprises a CISC or RISC computing processor and/or one or more digital signal processors. In some embodiments, processor 704 and/or 705 may be embodied in a custom device and/or may perform as a configurable sequencer. Device drivers 703 may provide output signals used to control internal and external components and to communicate between processors 704 and 705.

Computing system 70 also typically comprises memory 706 that may include one or more of random access memory (“RAM”), static memory, cache, flash memory and any other suitable type of storage device that can be coupled to bus 702. Memory 706 can be used for storing instructions and data that can cause one or more of processors 704 and 705 to perform a desired process. Main memory 706 may be used for storing transient and/or temporary data such as variables and intermediate information generated and/or used during execution of the instructions by processor 704 or 705. Computing system 70 also typically comprises non-volatile storage such as read only memory (“ROM”) 708, flash memory, memory cards or the like; non-volatile storage may be connected to the bus 702, but may equally be connected using a high-speed universal serial bus (USB), Firewire or other such bus that is coupled to bus 702. Non-volatile storage can be used for storing configuration, and other information, including instructions executed by processors 704 and/or 705. Non-volatile storage may also include mass storage device 710, such as a magnetic disk, optical disk, flash disk that may be directly or indirectly coupled to bus 702 and used for storing instructions to be executed by processors 704 and/or 705, as well as other information.

In some embodiments, computing system 70 may be communicatively coupled to a display system 712, such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user of computing system 70. Typically, device drivers 703 can include a display driver, graphics adapter and/or other modules that maintain a digital representation of a display and convert the digital representation to a signal for driving a display system 712. Display system 712 may also include logic and software to generate a display from a signal provided by system 700. In that regard, display 712 may be provided as a remote terminal or in a session on a different computing system 70. An input device 714 is generally provided locally or through a remote system and typically provides for alphanumeric input as well as cursor control 716 input, such as a mouse, a trackball, etc. It will be appreciated that input and output can be provided to a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.

In certain embodiments, computing system 70 may be embedded in a system that captures and/or processes images, including video images. In one example, computing system may include a video processor or accelerator 717, which may have its own processor, non-transitory storage and input/output interfaces. In another example, video processor or accelerator 717 may be implemented as a combination of hardware and software operated by the one or more processors 704, 705. In another example, computing system 70 functions as a video encoder, although other functions may be performed by computing system 70. In particular, a video encoder that comprises computing system 70 may be embedded in another device such as a camera, a communications device, a mixing panel, a monitor, a computer peripheral, and so on.

According to one embodiment of the invention, portions of the described invention may be performed by computing system 70. Processor 704 executes one or more sequences of instructions. For example, such instructions may be stored in main memory 706, having been received from a computer-readable medium such as storage device 710. Execution of the sequences of instructions contained in main memory 706 causes processor 704 to perform process steps according to certain aspects of the invention. In certain embodiments, functionality may be provided by embedded computing systems that perform specific functions wherein the embedded systems employ a customized combination of hardware and software to perform a set of predefined tasks. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.

The term “computer-readable medium” is used to define any medium that can store and provide instructions and other data to processor 704 and/or 705, particularly where the instructions are to be executed by processor 704 and/or 705 and/or other peripheral of the processing system. Such medium can include non-volatile storage, volatile storage and transmission media. Non-volatile storage may be embodied on media such as optical or magnetic disks, including DVD, CD-ROM and BluRay. Storage may be provided locally and in physical proximity to processors 704 and 705 or remotely, typically by use of network connection. Non-volatile storage may be removable from computing system 704, as in the example of BluRay, DVD or CD storage or memory cards or sticks that can be easily connected or disconnected from a computer using a standard interface, including USB, etc. Thus, computer-readable media can include floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.

Transmission media can be used to connect elements of the processing system and/or components of computing system 70. Such media can include twisted pair wiring, coaxial cables, copper wire and fiber optics. Transmission media can also include wireless media such as radio, acoustic and light waves. In particular radio frequency (RF), fiber optic and infrared (IR) data communications may be used.

Various forms of computer readable media may participate in providing instructions and data for execution by processor 704 and/or 705. For example, the instructions may initially be retrieved from a magnetic disk of a remote computer and transmitted over a network or modem to computing system 70. The instructions may optionally be stored in a different storage or a different part of storage prior to or during execution.

Computing system 70 may include a communication interface 718 that provides two-way data communication over a network 720 that can include a local network 722, a wide area network or some combination of the two. For example, an integrated services digital network (ISDN) may used in combination with a local area network (LAN). In another example, a LAN may include a wireless link. Network link 720 typically provides data communication through one or more networks to other data devices. For example, network link 720 may provide a connection through local network 722 to a host computer 724 or to a wide are network such as the Internet 728. Local network 722 and Internet 728 may both use electrical, electromagnetic or optical signals that carry digital data streams.

Computing system 70 can use one or more networks to send messages and data, including program code and other information. In the Internet example, a server 730 might transmit a requested code for an application program through Internet 728 and may receive in response a downloaded application that provides or augments functional modules such as those described in the examples above. The received code may be executed by processor 704 and/or 705.

ADDITIONAL DESCRIPTIONS OF CERTAIN ASPECTS OF THE INVENTION

The foregoing descriptions of the invention are intended to be illustrative and not limiting. For example, those skilled in the art will appreciate that the invention can be practiced with various combinations of the functionalities and capabilities described above, and can include fewer or additional components than described above. Certain additional aspects and features of the invention are further set forth below, and can be obtained using the functionalities and components described in more detail above, as will be appreciated by those skilled in the art after being taught by the present disclosure.

Certain embodiments of the invention provide video encoders, systems and methods for characterizing video encoding processes. Some of these embodiments comprise maintaining information relating a plurality of non-zero coefficients expected from quantization of a macroblock to one or more quantization parameters used in a video encoding process. Some of these embodiments comprise generating actual non-zero coefficients during video encoding of the macroblock. Some of these embodiments comprise calculating a deviation metric representing a weighted difference between the actual non-zero coefficients and the expected non-zero coefficients. Some of these embodiments comprise adjusting the video encoding process using the deviation metric. In some of these embodiments, the video encoding process is adjusted to obtain an optimized encoding bit rate for a desired video encoding quality.

In some of these embodiments, adjusting the video encoding process using the deviation metric includes adjusting the quantizing parameter based on a normalized value of the deviation metric. In some of these embodiments, the relationship between video encoding quality and the quantizing parameter is a function of motion complexity of a sequence of video frames to be encoded. In some of these embodiments, the normalized deviation metric value varies around a value of 1.0. In some of these embodiments, a normalized deviation metric value greater than 1.0 is indicative of a larger than expected encoded bit rate. In some of these embodiments, an increase in the normalized deviation metric value is indicative of am increase in complexity of motion contextual content. In some of these embodiments, the quantizing parameter is a function of the deviation metric. In some of these embodiments, adjusting the video encoding process using the deviation metric includes selecting a quantizing parameter using the deviation metric to index a table.

Some of these embodiments comprise the step of selecting an encoding mode using the deviation metric. In some of these embodiments, the encoding mode is selected to maintain a constant bit rate for frame encoding. Some of these embodiments comprise the step of allocating bits to frames based on temporal and spatial changes between a sequence of frames. In some of these embodiments, the bits are allocated to maintain a target minimum video quality.

Certain embodiments of the invention provide a video encoder and related methods. Some of these embodiments comprise a storage configured to maintain information relating a plurality of non-zero coefficients expected from quantization of a macroblock to one or more quantization parameters used in a video encoding process. Some of these embodiments comprise an encoder configured to receive a sequence of video frames and to encode macroblocks within the video frames. In some of these embodiments, the encoder generates actual non-zero coefficients during video encoding of the macroblocks. Some of these embodiments comprise a table of quantization parameters controlled by the encoder. In some of these embodiments, the encoder selects a quantization parameter for a current macroblock using deviation metric representing a weighted difference between the actual non-zero coefficients and the expected non-zero coefficients. In some of these embodiments, the video encoding process is adjusted to obtain an optimized encoding bit rate for a desired video encoding quality.

In some of these embodiments, the quantizing parameter is selected using a normalized value of the deviation metric. In some of these embodiments, the quantizing parameter is selected to achieve a target video encoding quality. In some of these embodiments, video encoding quality and quantizing parameters are related by a function of motion complexity of the sequence of video frames. In some of these embodiments, the method is performed by a processor in a video encoder that is configured to execute one or more computer program modules.

Although the present invention has been described with reference to specific exemplary embodiments, it will be evident to one of ordinary skill in the art that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. 

What is claimed is:
 1. A method of video encoding performed by a video encoder comprising: providing in non-transitory storage information relating a plurality of expected non-zero coefficients obtained from quantization of a macroblock to one or more quantization parameters used in a video encoding process; generating information relating a plurality of actual non-zero coefficients obtained during video encoding of the macroblock to the one or more quantization parameters; calculating a deviation metric comprising a ratio of the actual non-zero coefficients and the expected non-zero coefficients for one or more quantization step size; and adjusting the video encoding process using the deviation metric to obtain an encoding bit rate for a desired video encoding quality.
 2. The method of claim 2, wherein the deviation metric comprises a recursive weighted ratio and wherein adjusting the video encoding process includes using the deviation metric to obtain an optimized encoding bit rate for a predefined bit rate budget.
 3. The method of claim 1, wherein adjusting the video encoding process using the deviation metric includes adjusting at least one quantizing parameter based on a normalized deviation metric value.
 4. The method of claim 3, wherein the relationship between video encoding quality and the at least one quantizing parameter is a function of motion complexity of a sequence of video frames to be encoded.
 5. The method of claim 3, wherein the normalized deviation metric value varies around a value of 1.0, wherein a normalized deviation metric value greater than 1.0 is indicative of a larger than expected encoded bit rate.
 6. The method of claim 3, wherein an increase in the normalized deviation metric value is indicative of an increase in complexity of motion contextual content.
 7. The method of claim 3, wherein the quantizing parameter is a function of the deviation metric.
 8. The method of claim 3, wherein adjusting the video encoding process using the deviation metric includes selecting a quantizing parameter using the deviation metric to index a table relating quantization parameters and deviation metrics for one or more target qualities.
 9. The method of claim 1, further comprising the step of selecting an encoding mode using the deviation metric, wherein the encoding mode is selected to maintain a constant bit rate for frame encoding.
 10. The method of claim 1, further comprising the step of allocating bits to frames based on temporal and spatial changes between a sequence of frames, wherein the bits are allocated to maintain a target minimum video quality.
 11. A video encoder comprising: an encoder configured to receive a sequence of video frames and to encode macroblocks within the video frames, wherein the encoder generates actual non-zero coefficients during video encoding of the macroblocks; and a non-transitory storage adapted to maintain a table of quantization parameters, wherein the encoder is configured to select a quantization parameter from the table for a current macroblock using a deviation metric representing a weighted difference between actual non-zero coefficients generated for the current macroblock and non-zero coefficients expected for the current macroblock for one or more quantization step sizes, wherein the selected quantization parameter is used to select an encoding bit rate used by the encoder for a desired video encoding quality.
 12. The video encoder of claim 11, wherein the quantizing parameter is selected using a normalized value of the deviation metric and wherein the deviation metric comprises a recursive weighted ratio of actual and expected non-zero coefficients.
 13. The video encoder of claim 11, wherein the quantizing parameter is selected to achieve a target video encoding quality.
 14. The video encoder of claim 13, wherein video encoding quality and quantizing parameters are related by a function of motion complexity of the sequence of video frames.
 15. A non-transitory computer-readable medium encoded with data and instructions wherein the data and instructions, when executed by a processor of a video encoder, cause the video encoder to perform a video encoding method comprising: generating actual non-zero coefficients during video encoding of a macroblock using a selected quantization parameter; calculating a deviation metric representing a weighted difference between the actual non-zero coefficients and non-zero coefficients that were expected for the selected quantization parameter; using the deviation metric to obtain an encoding bit rate and a desired video encoding quality for the macroblock.
 16. The method of claim 15, wherein the deviation metric used to obtain an optimized encoding bit rate and a desired video encoding quality for the macroblock comprises a normalized recursive ratio of actual and expected non-zero coefficients.
 17. The method of claim 16, wherein using the deviation metric to obtain a desired video encoding quality for the macroblock includes selecting a quantization parameter based on motion complexity of a sequence of video frames.
 18. The method of claim 16, wherein the deviation metric is normalized and has a value that varies around a value of 1.0, wherein a normalized deviation metric value greater than 1.0 is indicative of a larger than expected encoded bit rate.
 19. The method of claim 16, wherein an increase in the normalized deviation metric value is indicative of an increase in complexity of motion contextual content. 